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Abstract 

We consider distributed iterative algorithms for the averaging problem over time- 
varying topologies. Our focus is on the convergence time of such algorithms when 
complete (unquantized) information is available, and on the degradation of perfor- 
mance when only quantized information is available. We study a large and natural 
class of averaging algorithms, which includes the vast majority of algorithms pro- 
posed to date, and provide tight polynomial bounds on their convergence time. 
We also describe an algorithm within this class whose convergence time is the best 
among currently available averaging algorithms for time-varying topologies. We 
then propose and analyze distributed averaging algorithms under the additional 
constraint that agents can only store and communicate quantized information, so 
that they can only converge to the average of the initial values of the agents within 
some error. We establish bounds on the error and tight bounds on the convergence 
time, as a function of the number of quantization levels. 
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1 Introduction 



There has been much recent interest in distributed control and coordination of networks 
consisting of multiple, potentially mobile, agents. This is motivated mainly by the 
emergence of large scale networks, characterized by the lack of centralized access to in- 
formation and time- varying connectivity. Control and optimization algorithms deployed 
in such networks should be completely distributed, relying only on local observations 
and information, and robust against unexpected changes in topology such as link or 
node failures. 

A canonical problem in distributed control is the consensus problem. The objective in 
the consensus problem is to develop distributed algorithms that can be used by a group 
of agents in order to reach agreement (consensus) on a common decision (represented 
by a scalar or a vector value). The agents start with some different initial decisions 
and communicate them locally under some constraints on connectivity and inter-agent 
information exchange. The consensus problem arises in a number of applications includ- 
ing coordination of UAVs (e.g., aligning the agents' directions of motion), information 
processing in sensor networks, and distributed optimization (e.g., agreeing on the esti- 
mates of some unknown parameters). The averaging problem is a special case in which 
the goal is to compute the exact average of the initial values of the agents. A natural 
and widely studied consensus algorithm, proposed and analyzed by Tsitsiklis [IB] and 
Tsitsiklis et al. [15] , involves, at each time step, every agent taking a weighted average 
of its own value with values received from some of the other agents. Similar algorithms 
have been studied in the load-balancing literature (see for example [8]). Motivated by 
observed group behavior in biological and dynamical systems, the recent literature in 
cooperative control has studied similar algorithms and proved convergence results under 
various assumptions on agent connectivity and information exchange (see [20], [15], [17] . 
[H], [13]). 

In this paper, our goal is to provide tight bounds on the convergence time (defined as 
the number of iterations required to reduce a suitable Lyapunov function by a constant 
factor) of a general class of consensus algorithms, as a function of the number n of 
agents. We focus on algorithms that are designed to solve the averaging problem. We 
consider both problems where agents have access to exact values and problems where 
agents only have access to quantized values of the other agents. Our contributions can 
be summarized as follows. 

In the first part of the paper, we consider the case where agents can exchange and 
store continuous values, which is a widely adopted assumption in the previous literature. 
We consider a large class of averaging algorithms defined by the condition that the 
weight matrix is a possibly nonsymmetric, doubly stochastic matrix. For this class of 
algorithms, we prove that the convergence time is 0(n 2 /r)), where n is the number of 
agents and r\ is a lower bound on the nonzero weights used in the algorithm. To the 
best of our knowledge, this is the best polynomial-time bound on the convergence time 
of such algorithms. We also show that this bound is tight. Since all previously studied 
linear schemes force rj to be of the order of 1/n, this result implies an 0(n 3 ) bound 
on convergence time. In Section HI we present a distributed algorithm that selects the 
weights dynamically, using three-hop neighborhood information. Under the assumption 



2 



that the underlying connectivity graph at each iteration is undirected, we establish an 
improved 0(n 2 ) upper bound on convergence time. This matches the best currently 
available convergence time guarantee for the much simpler case of static connectivity 
graphs [16J. 

In the second part of the paper, we impose the additional constraint that agents can 
only store and transmit quantized values. This model provides a good approximation 
for communication networks that are subject to communication bandwidth or storage 
constraints. We focus on a particular quantization rule, which rounds down the values to 
the nearest quantization level. We propose a distributed algorithm that uses quantized 
values and, using a slightly different Lyapunov function, we show that the algorithm 
guarantees the convergence of the values of the agents to a common value. In particular, 
we prove that all agents have the same value after 0((n 2 /r]) log(nQ)) time steps, where 
Q is the number of quantization levels per unit value. Due to the rounding-down feature 
of the quantizer, this algorithm does not preserve the average of the values at each 
iteration. However, we provide bounds on the error between the final consensus value 
and the initial average, as a function of the number Q of available quantization levels. 
In particular, we show that the error goes to at a rate of (logQ)/Q, as the number Q 
of quantization levels increases to infinity. 

Other than the papers cited above, our work is also related to [TT] and 0, [5], which 
studied the effects of quantization on the performance of averaging algorithms. In 
Kashyap et al. proposed randomized gossip-type quantized averaging algorithms under 
the assumption that each agent value is an integer. They showed that these algorithms 
preserve the average of the values at each iteration and converge to approximate consen- 
sus. They also provided bounds on the convergence time of these algorithms for specific 
static topologies (fully connected and linear networks). In the recent work [5], Carli et al. 
proposed a distributed algorithm that uses quantized values and preserves the average at 
each iteration. They showed favorable convergence properties using simulations on some 
static topologies, and provided performance bounds for the limit points of the generated 
iterates. Our results on quantized averaging algorithms differ from these works in that 
we study a more general case of time-varying topologies, and provide tight polynomial 
bounds on both the convergence time and the discrepancy from the initial average, in 
terms of the number of quantization levels. 

The paper is organized as follows. In Section [21 we introduce a general class of 
averaging algorithms, and present our assumptions on the algorithm parameters and on 
the information exchange among the agents. In Section [31 we present our main result 
on the convergence time of the averaging algorithms under consideration. In Section [H 
we present a distributed averaging algorithm for the case of undirected graphs, which 
picks the weights dynamically, resulting in an improved bound on the convergence time. 
In Section [51 we propose and analyze a quantized version of the averaging algorithm. In 
particular, we establish bounds on the convergence time of the iterates, and on the error 
between the final value and the average of the initial values of the agents. Finally, we 
give our concluding remarks in Section O 
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2 A Class of Averaging Algorithms 



We consider a set N = {1,2, . . . ,n} of agents, which will henceforth be referred to as 
"nodes." Each node % starts with a scalar value Xj(0). At each nonnegative integer time 
k, node % receives from some of the other nodes j a message with the value of Xj(k), and 
updates its value according to: 



where the a,ij(k) are nonnegative weights with the property that > only if node 

% receives information from node j at time k. We use the notation A(k) to denote the 
weight matrix [dij(k)]ij-i t .., iTl , so that our update equation is 



Given a matrix A, we use S(A) to denote the set of directed edges including self- 
edges (i, i), such that > 0. At each time k, the nodes' connectivity can be represented 
by the directed graph G(k) = (N,S(A(k))). 

Our goal is to study the convergence of the iterates Xi(k) to the average of the initial 
values, (l/n) Y^=i x i(®)i as ^ approaches infinity. In order to establish such convergence, 
we impose some assumptions on the weights a,ij(k) and the graph sequence G{k). 

Assumption 1 For each k, the weight matrix A(k) is a doubly stochastic matri$\ with 
positive diagonal entries. Additionally, there exists a constant r] > such that if a,ij(k) > 
0, then a,ij(k) > rj. 

The double stochasticity assumption on the weight matrix guarantees that the av- 
erage of the node values remains the same at each iteration (cf. the proof of Lemma 
[1] below). The second part of this assumption states that each node gives significant 
weight to its values and to the values of its neighbors at each time k. 

Our next assumption ensures that the graph sequence G(k) is sufficiently connected 
for the nodes to repeatedly influence each other's values. 

Assumption 2 There exists an integer B > 1 such that the directed graph 



is strongly connected for all nonnegative integers k. 

Any algorithm of the form given in Eq. (JT]) with the sequence of weights aij(k) 
satisfying Assumptions [1] and [2] solves the averaging problem. This is formalized in the 
following proposition. 

*A matrix is called doubly stochastic if it is nonnegative and all of its rows and columns sum to 1. 




(1) 



x(k + l) = A(k)x(k). 



N, S(A(kB))\JS(A(kB + 1)) |J • • -\JS(A((k + l)B - 1))) 
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Proposition 1 Let Assumptions^ and^hold. Let {x(k)} be generated by the algorithm 
(Q]). Then, for all i, we have 

1 - 

lim xAk) = — \ Xj(0). 

k^oo n 

This fact is a minor modification of known results in [181 H3 CO, E], where the 
convergence of each Xi(k) to the same value is established under weaker versions of 
Assumptions [1] and [2l The fact that the limit is the average of the entries of the vector 
x(0) follows from the fact that multiplication of a vector by a doubly stochastic matrix 
preserves the average of the vector's components. 

Recent research has focused on methods of choosing weights aij(k) that satisfy As- 
sumptions [U and [21 and minimize the convergence time of the resulting averaging algo- 
rithm (see [21] for the case of static graphs, see [15] and [2] for the case of symmetric 
weights, i.e., weights satisfying = Ojj(fc), and also see [7J 0]). For static graphs, 

some recent results on optimal time-invariant algorithms may be found in |16j . 



3 Convergence time 

In this section, we give an analysis of the convergence time of averaging algorithms 
of the form (CQ). Our goal is to obtain tight estimates of the convergence time, under 
Assumptions [1] and [2] 

As a convergence measure, we use the "sample variance" of a vector x G M n , defined 

as 

n 
i=l 

where x is the average of the entries of x: 




i ■ 



Let x(k) denote the vector of node values at time k [i.e., the vector of iterates 
generated by algorithm ([T|) at time k]. We are interested in providing an upper bound 
on the number of iterations it takes for the "sample variance" V(x(k)) to decrease to a 
small fraction of its initial value V(x(0)). We first establish some technical preliminaries 
that will be key in the subsequent analysis. In particular, in the next subsection, we 
explore several implications of the double stochasticity assumption on the weight matrix 
A(k). 



3.1 Preliminaries on Doubly Stochastic Matrices 

We begin by analyzing how the sample variance V(x) changes when the vector x is 
multiplied by a doubly stochastic matrix A. The next lemma shows that V(Ax) < V(x). 
Thus, under Assumption [TJ the sample variance V(x(k)) is nonincreasing in k, and 
V(x(k)) can be used as a Lyapunov function. 
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Lemma 1 Let A be a doubly stochastic matrix. Then^ for all x £ M. n , 

V(Ax) = V(x) - )]wij(xi - xj) 2 , 

i<j 

where w^ is the (i,j)-th entry of the matrix A T A. 

Proof. Let 1 denote the vector in W 1 with all entries equal to 1. The double stochasticity 
of A implies 

Al = 1, 1 T A = 1 T . 

Note that multiplication by a doubly stochastic matrix A preserves the average of the 
entries of a vector, i.e., for any x G M n , there holds 

Ax = — 1 Ax = — 1 x = x. 

n n 

We now write the quadratic form V(x) — V(Ax) explicitly, as follows: 

V(x)-V(Ax) = (x - xl) T (x - xl) - (Ax - Axl) T (Ax - Axl) 
= (x- xl) T (x - xl) - (Ax - xAl) T (Ax - xAl) 
= (x-xl) T (I- A T A)(x-xl). (2) 

Let Wij be the (i,j)-th entry of A T A. Note that A T A is symmetric and stochastic, 
so that uiij = Wji and Wu = 1 — Ylj^i w ij- Then, it can be verified that 

A T A = I — y~] Wij(ei - Cj)(ei - 6j) T ', (3) 

i<j 

where e, is a unit vector with the i-th entry equal to 1, and all other entries equal to 
(see also [22J where a similar decomposition was used). 
By combining Eqs. <^ and we obtain 

V(x) — V(Ax) = (x — xl) T ( Wjj(ej — tj)(ti — Cj) T ) (x — xl) 

i<j 

■ 

Note that the entries Wij(k) of y4(/c) T y4(/c) are nonnegative, because the weight matrix 
A(k) has nonnegative entries. In view of this, Lemma [1] implies that 

V(x(k + 1)) < V(x(k)) for all k. 

Moreover, the amount of variance decrease is given by 

V(x(k)) - V(x(k + 1)) = ^2 Wij (k){xi{k) - x,(k)) 2 . 

i<j 



2 In the sequel, the notation J2i<j be used to denote the double sum Y^j=i J2i=i- 
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We will use this result to provide a lower bound on the amount of decrease of the sample 
variance V(x(k)) in between iterations. 

Since every positive entry of A(k) is at least 77, it follows that every positive entry of 
A{k) T A{k) is at least rj 2 . Therefore, it is immediate that 

if Wij(k) > 0, then Wij(k) > rj 1 . 

In our next lemma, we establish a stronger lower bound. In particular, we find it useful 
to focus not on an individual w^, but rather on all associated with edges that 
cross a particular cut in the graph (N, S(A T A)). For such groups of Wy, we prove a 
lower bound which is linear in rj, as seen in the following. 

Lemma 2 Let A be a row-stochastic matrix with positive diagonal entries, and assume 
that the smallest positive entry in A is at least rj. Also, let (S~ , S + ) be a partition of the 
set N = {1, . . . , n} into two disjoint sets. If 

Wij > 0, 

ieS-, jes+ 

then 

i£S-, j£S+ 

Proof. Let Ylies- jes+ w v > 0- F rom the definition of the weights Wy, we have 
Wij = ^ fc afcjflfcj ; which shows that there exist % G S~~ , j G S + , and some k such that 
aki > and a^j > 0. For either case where k belongs to S~ or S + , we see that there 
exists an edge in the set £(A) that crosses the cut (S~, S + ). Let be such an edge. 

Without loss of generality, we assume that i* G S~ and j* G S + . 
We define 

Cp = ^2 a r*ii 

i&S+ 

See Figure [T](a) for an illustration. Since A is a row-stochastic matrix, we have 

implying that at least one of the following is true: 

Case (a): CZ > -, 
J 2 

Case (b): C+ > -. 

J 2 

We consider these two cases separately. In both cases, we focus on a subset of the edges 
and we use the fact that the elements correspond to paths of length 2, with one step 
in S(A) and another in S(A T ). 
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Figure 1: (a) Intuitively, Cp measures how much weight j* assigns to nodes in S + (including 
itself), and CZ measures how much weight j* assigns to nodes in S~ . Note that the edge 
is also present, but not shown, (b) For the case where CZ > 1/2, we only focus on 
two-hop paths between j* and elements i G S~ obtained by taking as the first step and 

the self-edge as the second step, (c) For the case where Cp > 1/2, we only focus on 

two- hop paths between i* and elements j G S + obtained by taking (i* ,j*) as the first step in 
£{A) and as the second step in £{A T ). 



Case (a): Cjl > 1/2. 

We focus on those w^ with i G S~ and j = j*. Indeed, since all w^ are nonnegative, we 
have 

W H ^ w v*' ( 4 ) 

ies-, jes+ ies- 
For each element in the sum on the right-hand side, we have 

n 

^ ^ &ki &kj* > <Xj*i <Xj*j* > 7], 



Wij* 



k=l 



where the inequalities follow from the facts that A has nonnegative entries, its diagonal 
entries are positive, and its positive entries are at least r\. Consequently, 

^2 w ir >v J2 a i** = *l c r- ( 5 ) 

ies- ies- 

Combining Eqs. (jl]) and and recalling the assumption Cj» > 1/2, the result follows. 
An illustration of this argument can be found in Figure [T](b). 
Case (b): Cp > 1/2. 

We focus on those w^ with i = i* and j G S + . We have 

^2 Wij > "£2 W i*V ( 6 ) 

ies_, jes+ jes+ 

since all Wij are nonnegative. For each element in the sum on the right-hand side, we 
have 

n 

Wi *i = S ^2 <X ki* a kj — 0>j*j > V a j*jy 

k=l 



where the inequalities follow because all entries of A are nonnegative, and because the 
choice E S(A) implies that a,j*i* > r\. Consequently, 

j&s+ jes+ 

Combining Eqs. (JH]) and (j7j), and recalling the assumption Cp > 1/2, the result follows. 
An illustration of this argument can be found in Figure [D(c). ■ 

3.2 A Bound on Convergence Time 

With the preliminaries on doubly stochastic matrices in place, we can now proceed to 
derive bounds on the decrease of V(x(k)) in between iterations. We will first somewhat 
relax our connectivity assumptions. In particular, we consider the following relaxation 
of Assumption [2j 

Assumption 3 Given an integer k > 0, suppose that the components of x(kB) have 
been reordered so that they are in nonincreasing order. We assume that for every 
d E {l,...,n — 1}, we either have Xd{kB) = Xd+i(kB), or there exist some time 
t E {kB, . . . , {k + 1)B — 1} and some i E {1, . . . , d}, j E {d + 1, . . . , n} such that 
or belongs to £(A(t)) . 

Lemma 3 Assumption^ implies Assumption^ with the same value of B. 

Proof. If Assumption [3] does not hold, then there must exist an index d [for which 
Xd(kB) ^ Xd+i(kB) holds] such that there are no edges between nodes 1,2, . . .,d and 
nodes d + 1, . . . , n during times t = kB, . . . , (k + 1)B — 1. But this implies that the 
graph 

(A, £(A(kB)) \j£(A(kB + 1))|J ■ ■ ■ |J S(A((k + 1)B - 1))) 

is disconnected, which violates Assumption 2. ■ 

For our convergence time results, we will use the weaker Assumption [3], rather than 
the stronger Assumption 2. Later on, in Section HI we will exploit the sufficiency of 
Assumption [3] to design a decentralized algorithm for selecting the weights aij(k), which 
satisfies Assumption [31 but not Assumption 2. 

We now proceed to bound the decrease of our Lyapunov function V(x(k)) during 
the interval [kB, {k + 1)B — 1]. In what follows, we denote by V(k) the sample variance 
V(x(k)) at time k. 

Lemma 4 Let Assumptions^ and{3^ hold. Let {x(k)} be generated by the update rule 
Suppose that the components Xi(kB) of the vector x(kB) have been ordered from 
largest to smallest, with ties broken arbitrarily. Then, we have 

n-l 

V(kB) - V((k + l)B) > | ^MkB) - x l+l {kB)f. 

i=l 
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Proof. By Lemma [TJ we have for all t, 



V(t) -V(t + l) = J2 ™ii(t)(xi(t) ~ Xj(t)) 2 , 
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i<j 



where Wij(t) is the (2, j)-th entry of A(t) T A(t). Summing up the variance differences 
V(t) — V(t + 1) over different values of t, we obtain 



We next introduce some notation. 

(a) For all d G {1, . . . , n — 1}, let t d be the first time larger than or equal to kB (if 
it exists) at which there is a communication between two nodes belonging to the 
two sets {1, . . . , d} and {d + 1, ... , n}, to be referred to as a communication across 
the cut d. 

(b) For all t G {kB, . . . , (k + l)B - 1}, let D(t) = {d | t d = t}, i.e., D{t) consists 
of "cuts" d G {l,...,n — 1} such that time t is the first communication time 
larger than or equal to kB between nodes in the sets {1, . . . , d} and {d+ 1, . . . , n}. 
Because of Assumption [31 the union of the sets D{t) includes all indices 1, . . . , n— 1, 
except possibly for indices for which Xd(kB) = Xd+i(kB). 

(c) For all d G {1, . . . , n - 1}, let C d = \ i<d, d+l<j}. 

(d) For all t G {kB, ...,(k + l)B- 1}, let F^t) = {d G D(t) | (ij) or (j, z) G C d }, 
i.e., Fij{t) consists of all cuts d such that the edge (2, j) or (j, i) at time t is the 
first communication across the cut at a time larger than or equal to kB. 

(e) To simplify notation, let i/i = Xi(kB). By assumption, we have yi > ■ ■ ■ > y n . 

We make two observations, as follows: 

(1) Suppose that d G D(t). Then, for some G C d , we have either ciij(t) > or 
aji(t) > 0. Because A(t) is nonnegative with positive diagonal entries, we have 



(k+l)B-l 




(9) 



t=kB i<j 



n 





k=l 



and by Lemma [21 we obtain 




(10) 



(M)£Cd 



10 



(2) Fix some with i < j, and time t G {kB, . . . , (k + 1)B — 1}, and suppose 

that Fij(t) is nonempty. Let Fij(t) = {d\, . . . ,dk}, where the dj are arranged in 
increasing order. Since d\ G Fij(t), we have d\ G -D(i) and therefore t dl = t. By 
the definition of t dl , this implies that there has been no communication between a 
node in {1, . . . , d\} and a node in {d\ + l, . . . , n} during the time interval [kB,t — 1]. 
It follows that Xj(t) > y^. By a symmetrical argument, we also have 

Xj{t) < y dk+1 . (11) 

These relations imply that 

Xi(t) - Xj(t) > y dl - y dk+ i> Y ~ Vd+i), 

Since the components of y are sorted in nonincreasing order, we have Vd — Ud+i > 0, 
for every d G Fij(t). For any nonnegative numbers we have 

{z x + ■ ■ ■ + z k f >zl + --- + z 2 k1 

which implies that 

(Xi(t) - Xj (t)) 2 > J2 (Vd-Vd+i) 2 . (12) 



We now use these two observations to provide a lower bound on the expression on the 
right-hand side of Eq. (jSJ) at time t. We use Eq. ( 1121) and then Eq. (flOj) . to obtain 



2 
+1) 



2 
+1) 



^2wij(t)(xi(t) -Xj(t)) 2 > ^Wijit) {y d -yd 

i<j i<3 deF i:j (t) 

= Y Wij(t)(yd-yd 

deD(t) (i,j)ec d 

> \ Y ~ Vd+lf- 

d£D(t) 

We now sum both sides of the above inequality for different values of t, and use Eq. (J9]) 
to obtain 

(k+l)B-l 

V(kB)-V((k + l)B) = Y Y w ^( Xi ^- x i^ 2 

t=kB i<j 
(fe+l)B-l 

- \ Y Y ( yd - yd +^ 2 

t=kB d£D(t) 
n-1 

d=l 
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where the last inequality follows from the fact that the union of the sets D(t) is only 
missing those d for which y d = i/d+i- ■ 

We next establish a bound on the variance decrease that plays a key role in our 
convergence analysis. 

Lemma 5 Let Assumptions Ul andlMhold, and suppose that V(kB) > 0. Then, 

v .m^±m > jL 2 forallk . 

V(kB) ~ 2n 2 J 

Proof. Without loss of generality, we assume that the components of x(kB) have been 
sorted in nonincreasing order. By Lemma HJ we have 

n-1 

V{kB) - V{(k + l)B) > | - x i+1 (kB)) 2 . 

i=i 



This implies that 



V(kB) - V((k + l)B) > rj Eti( x i(kB) - x i+ i(kB)f 



V(kB) "2 YP l= M{kB) - x{kB)Y ■ 

Observe that the right-hand side does not change when we add a constant to every 
Xi(kB). We can therefore assume, without loss of generality, that x(kB) = 0, so that 

V{kB) - V((k + l)B) > 77 m . n Er=i X (^ " ^+i) 2 



v(kB) -2 Er=i^ 2 

Note that the right-hand side is unchanged if we multiply each by the same constant. 
Therefore, we can assume, without loss of generality, that Y^=i x 2 = 1, so that 

V(kB)-V((k + l)B) r, Vfx-x^ 2 (13) 

V(kB) 2 A ^ {Xt Xl+l) ■ (13) 

The requirement Y2i x f = 1 implies that the average value of x 2 is 1/n, which implies 
that there exists some j such that \xj\ > 1/ \fn. Without loss of generality, let us suppose 
that this Xj is positive]! 

The rest of the proof relies on a technique from [12] to provide a lower bound on the 
right-hand side of Eq. ([IB"]) . Let 

Zi = Xi — x i+ i for i < n, and z n = 0. 



3 Othcrwise, we can replace x with — x and subsequently reorder to maintain the property that the 
components of x are in descending order. It can be seen that these operations do not affect the objective 
value. 
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Note that Z{ > for all i and 

n 
i=l 

Since Xj > l/y/ri for some j, we have that x\ > \j\fn\ since J^" =1 = 0, it follows that 
at least one Xi is negative, and therefore x n < 0. This gives us 



" 1 



Jn 



Combining with Eq. (TTBl . we obtain 

V(kB)-V((k + l)B) n 



\ — - > - mm > zf. 

V(kB) ~ 2 Zi >o, Ei*>Vy/n^ 



The minimization problem on the right-hand side is a symmetric convex optimization 
problem, and therefore has a symmetric optimal solution, namely Z\ = 1/n 1 ' 5 for all %. 
This results in an optimal value of 1/n 2 . Therefore, 

V{kB) - V((k + l)B) n 
V(kB) - 2^2' 

which is the desired result. ■ 

We are now ready for our main result, which establishes that the convergence time 
of the sequence of vectors x{k) generated by Eq. flTJ is of order 0(n 2 B/rj). 

Theorem 1 Let Assumptions {1\ and\3\hold. Then, there exists an absolute constant c 
such that we have 

V(k) < eV(0) for all k > c(n 2 /n)B\og{l/ e). 

Proof. The result follows immediately from Lemma [5j ■ 

Recall that, according to Lemma El Assumption [2] implies Assumption El In view 
of this, the convergence time bound of Theorem [TJ holds for any n and any sequence 
of weights satisfying Assumptions [1] and [2j In the next subsection, we show that this 
bound is tight when the stronger Assumption [2] holds. 



3.3 Tightness 

The next proposition shows that the convergence time bound of Theorem [1] is tight 
under Assumption [2j 

4 Wc say c is an absolute constant when it does not depend on any of the parameters in the problem, 
in this case n, B, r), e. 
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Proposition 2 There exist constants c and no with the following property. For any 
n > n , nonnegative integer B, rj < 1/2, and e < 1, there exist a sequence of weight 
matrices A(k) satisfying Assumptions Q and® and an initial value x(0) such that if 
V(k)/V(0) < e, then 

n 2 , 1 
k > c — B log - . 

7] e 

Proof. Let P be the circulant shift operator defined by Pei = e^+i, Pe n = ei, where 
is a unit vector with the i-th entry equal to 1, and all other entries equal to 0. Consider 
the symmetric circulant matrix defined by 

A = (1 - 2rj)I + r]P + ijP~ 1 . 

Let A(k) = A, when k is a multiple of B, and A(k) = I otherwise. Note that this 
sequence satisfies Assumptions 1 and 2. 
The second largest eigenvalue of A is 

2ix 

X 2 (A) = 1 - 2rj + 2r] cos — , 

n 

(see Eq. (3.7) of [9]). Therefore, using the inequality cosx > 1 — x 2 /2, 

Anir 2 



X 2 (A) > 1 



n 2 



For n large enough, the quantity on the right-hand side is nonnegative. Let the initial 
vector x(0) be the eigenvector corresponding to \ 2 {A). Then, 



X 2 (A) 2k > (l - 



2 X k 



,2 



V(Q) zv ' ~ \ n 

For the right-hand side to become less than e, we need k = Q((n 2 /r])\og(l/e)). This 
implies that for V(k)/V(0) to become less than e, we need k = Q((n 2 /rj)B log(l/e)). ■ 



4 Saving a factor of n: faster averaging on undi- 
rected graphs 

In the previous section, we have shown that a large class of averaging algorithms have 
0{B{n 2 /rj) log 1/e) convergence time. Moreover, we have shown that this bound is tight, 
in the sense that there exist matrices satisfying Assumptions [1] and [3] which converge in 
n(B(n 2 /r]) log 1/e). 

In this section, we consider decentralized ways of synthesizing the weights a,ij(k) while 
satisfying Assumptions [U and [3j Our focus is on improving convergence time bounds by 
constructing "good" schemes. 

We assume that the communications of the nodes are governed by an exogenous 
sequence of graphs G{k) = (N, £{k)) that provides strong connectivity over time periods 
of length B. This sequence of graphs constrains the matrices A(k) that we can use; 
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in particular, we require that ciij{k) = if ^ £{k). Naturally, we assume that 
G £{k) for every i. 

Several such decentralized protocols exist. For example, each node may assign 

a,ij(k) = e, if (J, i) e £{k) and i ^ j, 

a u (k) = l-e-deg(i), 

where deg(i) is the degree of i in G(k). If e is small enough and the graph G(k) is 
undirected [i.e., € £{k) if and only if e £(k)], this results in a nonnegative, 
doubly stochastic matrix (see [H]). However, if a node has 0(n) neighbors, rj will be of 
order G(l/n), resulting in G(n 3 ) convergence time. Moreover, this argument applies to 
all protocols in which nodes assign equal weights to all their neighbors; see [21] and [2] 
for more examples. 

In this section, we examine whether it is possible to synthesize the weights CLij(k) in 
a decentralized manner, so that Oy(A;) > r\ whenever Oy(fc) 7^ 0, where rj is a positive 
constant independent of n and B. We show that this is indeed possible, under the 
additional assumption that the graphs G(k) are undirected. Our algorithm is data- 
dependent, in that a,ij(k) depends not only on the graph G(k), but also on the data 
vector x(k). Furthermore, it is a decentralized 3-hop algorithm, in that aij(k) depends 
only on the data at nodes within a distance of at most 3 from i. Our algorithm is 
such that the resulting sequences of vectors x{k) and graphs G{k) = (N,£(k)), with 
£{k) = I CLij(k) > 0}, satisfy Assumptions Q] and [3j Thus, a convergence time 

result can be obtained from Theorem [TJ 

4.1 The algorithm 

The algorithm we present here is a variation of an old load balancing algorithm (see [8] 
and Chapter 7.3 of P)@ 

At each step of the algorithm, each node offers some of its value to its neighbors, 
and accepts or rejects such offers from its neighbors. Once an offer from % to j, of size 
5 > 0, has been accepted, the updates x» <— Xj — 5 and Xj <— Xj + 5 are executed. 

We next describe the formal steps the nodes execute at each time k. For clarity, 
we refer to the node executing the steps below as node C. Moreover, the instructions 
below sometimes refer to the neighbors of node C; this always means current neighbors 
at time k, when the step is being executed, as determined by the current graph G(k). 
We assume that at each time k, all nodes execute these steps in the order described 
below, while the graph remains unchanged. 

Balancing Algorithm: 

1. Node C broadcasts its current value xc to all its neighbors. 

2. Going through the values it just received from its neighbors, Node C finds the 
smallest value that is less than its own. Let D be a neighbor with this value. Node 
C makes an offer of (xc — £d)/3 to node D. 

5 This algorithm was also considered in [TH], but in the absence of a result such as Theorem [TJ a 
weaker convergence time bound was derived. 
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If no neighbor of C has a value smaller than xc, node C does nothing at this stage. 

3. Node C goes through the incoming offers. It sends an acceptance to the sender of 
a largest offer, and a rejection to all the other senders. It updates the value of xc 
by adding the value of the accepted offer. 

If node C did not receive any offers, it does nothing at this stage. 

4. If an acceptance arrives for the offer made by node C, node C updates xq by 
subtracting the value of the offer. 

Note that the new value of each node is a linear combination of the values of its 
neighbors. Furthermore, the weights ciij(k) are completely determined by the data and 
the graph at most 3 hops from node % in G{k). This is true because in the course 
of execution of the above steps, each node makes at most three transmission to its 
neighbors, so the new value of node C cannot depend on information more than 3 hops 
away from C. 

4.2 Performance analysis 

In the following theorem, we are able to remove a factor of n from the worst-case con- 
vergence time bounds of Theorem [TJ 

Theorem 2 Consider the balancing algorithm, and suppose that G(k) = (N,£(k)) is a 
sequence of undirected graphs such that (N, £{kB) U £{kB + 1) U • • • U £((k + 1)B — 1)) 
is connected, for all integers k. There exists an absolute constant c such that we have 

V(k) < eV(0) for all k > cn 2 B log(l/e). 

Proof. Note that with this algorithm, the new value at some node i is a convex com- 
bination of the previous values of itself and its neighbors. Furthermore, the algorithm 
keeps the sum of the nodes' values constant, because every accepted offer involves an 
increase at the receiving node equal to the decrease at the offering node. These two 
properties imply that the algorithm can be written in the form 

x{k + l) = A{k)x{k), 

where A(k) is a doubly stochastic matrix, determined by G(k) and x(k). It can be seen 
that the diagonal entries of A{k) are positive and, furthermore, all nonzero entries of 
A(k) are larger than or equal to 1/3; thus, 77 = 1/3. 

We claim that the algorithm [in particular, the sequence £{A{k))] satisfies Assump- 
tion [31 Indeed, suppose that at time kB, the nodes are reordered so that the val- 
ues Xi(kB) are nonincreasing in i. Fix some d G {l,...,n — 1}, and suppose that 
x d {kB) ^ x d+ i(kB). Let S + = {1, . . . , d} and S_ = {d + 1, . . . , n). 

Because of our assumptions on the graphs G(k), there will be a first time t in the 
interval {kB, . . . , (k + 1)B — 1}, at which there is an edge in £{t) between some i* G S + 
and j* G S~ . Note that between times kB and t, the two sets of nodes, S + and S~ , 
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do not interact, which implies that Xi(t) > Xd(kB), for % G S + , and Xj(t) < Xd{kB), for 



At time t, node i* sends an offer to a neighbor with the smallest value; let us denote 
that neighbor by k*. Since G S(t), we have Xk*if) < Xj*(t) < Xd(kB), which 

implies that k* G S~ . Node k* will accept the largest offer it receives, which must come 
from a node with a value no smaller than Xi*(t), and therefore no smaller than Xd{kB); 
hence the latter node belongs to S + . It follows that £(A(t)) contains an edge between 
k* and some node in S + , showing that Assumption [3] is satisfied. 

The claimed result follows from Theorem HJ because we have shown that all of the 
assumptions in that theorem are satisfied with 7] = 1/3. ■ 

5 Quantization Effects 

In this section, we consider a quantized version of the update rule flTj). This model is 
a good approximation for a network of nodes communicating through finite bandwidth 
channels, so that at each time instant, only a finite number of bits can be transmitted. 
We incorporate this constraint in our algorithm by assuming that each node, upon 
receiving the values of its neighbors, computes the convex combination Y^j=i aij(k)xj(k) 
and quantizes it. This update rule also captures a constraint that each node can only 
store quantized values. 

Unfortunately, under Assumptions Q] and [2], if the output of Eq. (pP) is rounded to the 
nearest integer, the sequence x{k) is not guaranteed to converge to consensus; see [TT] . 
We therefore choose a quantization rule that rounds the values down, according to 



where |_ - J represents rounding down to the nearest multiple of 1/Q, and where Q is some 
positive integer. 

We adopt the natural assumption that the initial values are already quantized. 
Assumption 4 For all i, Xj(0) is a multiple of 1/Q. 
For convenience we define 



j € S-. 




(14) 



U = max£j(0), 



L = minXj(O). 



We use K to denote the total number of relevant quantization levels, i.e., 



K=(U-L)Q, 



which is an integer by Assumption HI 
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5.1 A quantization level dependent bound 

We first present a convergence time bound that depends on the quantization level Q. 

Proposition 3 Let Assumptions^ [H and^hold. Let {x(k)} be generated by the update 
rule P^p . If k > nBK, then all components of x{k) are equal. 

Proof. Consider the nodes whose initial value is U. There are at most n of them. As 
long as not all entries of x(k) are equal, then every B iterations, at least one node must 
use a value strictly less than U in an update; such a node will have its value decreased 
to U — 1/Q or less. It follows that after nB iterations, the largest node value will be 
at most U — 1/Q. Repeating this argument, we see that at most nBK iterations are 
possible before all the nodes have the same value. ■ 

Although the above bound gives informative results for small K, it becomes weaker 
as Q (and, therefore, K) increases. On the other hand, as Q approaches infinity, the 
quantized system approaches the unquantized system; the availability of convergence 
time bounds for the unquantized system suggests that similar bounds should be possible 
for the quantized one. Indeed, in the next subsection, we adopt a notion of convergence 
time parallel to our notion of convergence time for the unquantized algorithm; as a 
result, we obtain a bound on the convergence time which is independent of the total 
number of quantization levels. 

5.2 A quantization level independent bound 

We adopt a slightly different measure of convergence for the analysis of the quantized 
consensus algorithm. For any x G M n , we define mix) = minjXj and 

n 

V_(x) = ^^(xj — m{x)) 2 . 

i=l 

We will also use the simpler notation m(k) and V_(k) to denote m(x(k)) and V_(x(k)), 
respectively, where it is more convenient to do so. The function V_ will be our Lyapunov 
function for the analysis of the quantized consensus algorithm. The reason for not 
using our earlier Lyapunov function, V, is that for the quantized algorithm, V is not 
guaranteed to be monotonically nonincreasing in time. On the other hand, we have that 
V(x) < V_(x) < AnV(x) for anj{§ x G W l . As a consequence, any convergence time 
bounds expressed in terms of V_ translate to essentially the same bounds expressed in 
terms of V, up to a logarithmic factor. 

Before proceeding, we record an elementary fact which will allow us to relate the 
variance decrease V(x) — V(y) to the decrease, V_(x) — V_(y), of our new Lyapunov 
function. The proof involves simple algebra, and is therefore omitted. 

6 The first inequality follows because J2i( x i ~ z ) 2 1S minimized when z is the mean of the vector x; 
to establish the second inequality, observe that it suffices to consider the case when the mean of x is 
and V(x) = 1. In that case, the largest distance between m and any Xi is 2 by the triangle inequality, 
so V_(x) < An. 
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Lemma 6 Let u±, . . . , u n and w\, . . . ,w n be real numbers satisfying 

n n 
i=l i=l 

Then, the expression 

n n 

f(z) = J> - zf - - zf 

j=l i=l 

is a constant, independent of the scalar z. 

Our next lemma places a bound on the decrease of the Lyapunov function V_(t) 
between times kB and (k + 1)B — 1. 

Lemma 7 Let Assumptions [21 and \4\ hold. Let {x(k)} be generated by the update 
rule (fl^p . Suppose that the components Xi(kB) of the vector x(kB) have been ordered 
from largest to smallest, with ties broken arbitrarily. Then, we have 

n-l 

V(kB)-V((k + l)B) > | ^TOr^-x^LB)) 2 . 

i=l 

Proof. For all k, we view Eq. (1141) as the composition of two operators: 

y(k) = A(k)x(k), 
where A(k) is a doubly stochastic matrix, and 

x(k + l)= [_y(k)\, 

where the quantization |_-J is carried out componentwise. 

We apply Lemma M with the identification Ui = Xi(k), Wi = yi(k). Since multiplica- 
tion by a doubly stochastic matrix preserves the mean, the condition ^ U{ = £V Wi is 
satisfied. By considering two different choices for the scalar z, namely, Z\ = x(k) = y(k) 
and Z2 = m(k), we obtain 

n 

V(x(k)) - V(y(k)) = V(x(k)) - - m ( k )) 2 - ( 15 ) 

i=l 

Note that Xi(k + 1) — m{k) < yi(k) — m(k). Therefore, 

n n 

v(x(k)) - Y<^ k ) - m W) 2 < Y-«k)) - + 1) - m ( k )) 2 - ( 16 ) 

i=l i=l 

Furthermore, note that since Xi(k + 1) > m(k + 1) > m(k) for all i, we have that 
Xi(k + 1) — m(k + 1) < Xi(k + 1) — m(k). Therefore, 

n 

V{x{k)) - ^>2(xi(k + 1) - m{k)) 2 < V(x(k)) - V(x(k + 1)). (17) 

i=l 
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By combining Eqs. (|T5|) . (|T6l) . and (|T7|) . we obtain 

V(x(t)) - < Z(a:(<)) - Y_(x(t + 1)) for all t. 

Summing the preceding relations over t = kB, . . . , (k + 1)B — 1, we further obtain 

(fc+l)B-l 

£ (^W) - V(y(t))) < V(x(kB)) - V{<{k + !)£))• 

i=fcB 

To complete the proof, we provide a lower bound on the expression 

(k+l)B-l 

£ (n*w)-nv(*)))- 

Since ?/(t) = A(t)a;(t) for all t, it follows from Lemma [T] that for any t, 
V(x(t)) - V(y(t)) = J2^At)(^i(t) ~ xj(t)) 2 , 

i<j 

where Wij(t) is the (i,j)-th entry of A(t) T A(t). Using this relation and following the 
same line of analysis used in the proof of Lemma H] [where the relation Xi(t) > holds 
in view of the assumption that Xi(kB) is a multiple of 1/Q for all k > 0, cf. Assumption 
H] , we obtain the desired result. ■ 

The next theorem contains our main result on the convergence time of the quantized 
algorithm. 



Theorem 3 Let Assumptions [21 and [7] hold. Let {x(k)} be generated by the update 
rule P^y . Then, there exists an absolute constant c such that we have 

V(k) < eV(0) for all k > c (n 2 /7])B log(l/e). 

Proof. Let us assume that V_(kB) > 0. From Lemma we have 



V(kB) - V((k + 1)B) > \ J^iMkB) - x i+ i(kB)) 



n-l 



2 



i=l 



where the components Xi(kB) are ordered from largest to smallest. Since V_(kB) 
Y^i=i( x i(kB) - x n (kB)) 2 , we have 

V(kB) -V((k + l)B) > V j:^ l 1 (x l (kB)-x l+1 (kB)) 2 



V{kB) "2 YZ =1 (xi(kB) - x n (kB)f ■ 

Let yi = Xi(kB) — x n (kB). Clearly, yi > for all i, and y n = 0. Moreover, the 
monotonicity of Xi(kB) implies the monotonicity of y^. 

yi > y2 > ■ • ■ > y n = o. 
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Thus, 



V(kB) - V((k + l)B) > v ^ TJlZxiVi - Vi+i) 



V(kB) -2 w >^> =o ->,n ELi^ 2 

Next, we simply repeat the steps of Lemma We can assume without loss of generality 
that J2i=i Hi = 1- Define Zi = x/i — z/i+i for i = 1, . . . , n — 1 and z n = 0. We have that Zj 
are all nonnegative and J2i z i = Hi — Vn > 1/y/n. Therefore, 



n-l 



— min W% — %+i) 2 > — min y^z, 2 - 



Ei»?=i i=1 



The minimization problem on the right-hand side has an optimal value of at least 1/n 2 , 
and the desired result follows. ■ 



5.3 Extensions and modifications 

In this subsection, we comment briefly on some corollaries of Theorem [3j 

First, we note that the results of Section H] immediately carry over to the quantized 
case. Indeed, in Section HJ we showed how to pick the weights in a decentralized 

manner, based only on local information, so that Assumptions 1 and [3] are satisfied, with 
f] > 1/3. When using a quantized version of the balancing algorithm, we once again 
manage to remove the factor of 1/rj from our upper bound. 

Proposition 4 For the quantized version of the balancing algorithm, and under the 
same assumptions as in Theorem^ if k > cn 2 B\og(l/e)), then V_(k) < eV_(0), where c 
is an absolute constant. 

Second, we note that Theorem [3] can be used to obtain a bound on the time until the 
values of all nodes are equal. Indeed, we observe that in the presence of quantization, 
once the condition V_(k) < l/Q 2 is satisfied, all components of x(k) must be equal. 

Proposition 5 Consider the quantized algorithm and assume that Assumptions 

[HO and\4\ hold. If k > c(n 2 /r])B [log Q + log V_(0)], then all components of x(k) are 
equal, where c is an absolute constant. 



5.4 Tightness 

We now show that the quantization- level independent bound in Theorem [3] is tight, even 
when the weaker Assumption [3] is replaced with the stronger Assumption [2j 

Proposition 6 There exist absolute constants c and n with the following property. For 
any nonnegative integer B, rj < 1/2, e < 1, and and n > n , there exist a sequence of 
weight matrices A(k) satisfying Assumptions^ and\^ and an initial value x(0) satisfying 
Assumption^ and a number quantization levels Q{n) (depending onn) such that under 
the dynamics of Eq. [14\ ), if V_(k) /V_(0) < e, then 

n 2 1 
k > c — B log - . 
T] e 
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Proof. We have demonstrated in Proposition [2] a similar result for the unquantized 
algorithm. Namely, we have shown that for n large enough and for any B, rj < 1/2, and 
e < 1, there exists a weight sequence ciij(k) and an initial vector x(0) such that the first 
time when V{t) < eV(0) occurs after Q((n 2 /r])B\og(l/e)) steps. Let T* be this first 
time. 

Consider the quantized algorithm under the exact same sequence ciij(k), initialized at 
|_a;(0)J. Let Xi(t) refer to the value of node i at time t in the quantized algorithm under 
this scenario, as opposed to Xi(t) which denotes the value in the unquantized algorithm. 
Since quantization can only decrease a nodes value by at most 1 /Q at each iteration, it 
is easy to show, by induction, that 

Xi (t) >Xi(t) >Xi{t)-t/Q 

We can pick Q large enough so that, for t < T*, the vector x(t) is as close as desired to 
x(t). 

Therefore, for t < T* and for large enough Q, V_(x(t)) /V_(x(0)) will be arbitrarily 
close to V_(x(t))/V_(x(0)). From the proof of Proposition [2l we see that x(t) is always a 
scalar multiple of x(0). Since V_{x)/V(x) is invariant under multiplication by a constant, 
it follows that V_{x{t))/V_(x(0)) = V(x(t))/V(x(0)). Since this last quantity is above e 
for t < T*, it follows that provided Q is large enough, V_(x(t))/V_(x(0)) is also above e 
for t <T*. This proves the proposition. ■ 



5.5 Quantization error 

Despite favorable convergence properties of our quantized averaging algorithm (Ej), the 
update rule does not preserve the average of the values at each iteration. Therefore, 
the common limit of the sequences Xi(k), denoted by Xf, need not be equal to the exact 
average of the initial values. We next provide an upper bound on the error between Xf 
and the initial average, as a function of the number of quantization levels. 

Proposition 7 There is an absolute constant c such that for the common limit Xf of 
the values Xi(k) generated by the quantized algorithm [Lfy , we have 



1 n 



n 

i=l 



2 

c n 



<- —B\og(Qn(U-L)). 



Proof. By Proposition [5], after Oy(n 2 /r])Blog(QV_(x(0)))j iterations, all nodes will 

have the same value. Since V_(x(0))) < n(U — L) 2 and the average decreases by at most 
1/Q at each iteration, the result follows. ■ 

Let us assume that the parameters B, rj, and U — L are fixed. Proposition [7| implies 
that as n increases, the number of bits used for each communication, which is propor- 
tional to log Q, needs to grow only as O(logn) to make the error negligible. Furthermore, 
this is true even if the parameters B, 1/r], and U — L grow polynomially in n. 

For a converse, it can be seen that f2(logn) bits are needed. Indeed, consider n nodes, 
with n/2 nodes initialized at 0, and n/2 nodes initialized at 1. Suppose that Q < n/2; 
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Figure 2: Initial configuration. Each node takes the average value of its neighbors. 



we connect the nodes by forming a complete subgraph over all the nodes with value 
and exactly one node with value 1; see Figure [2] for an example with n — 6. Then, each 
node forms the average of its neighbors. This brings one of the nodes with an initial 
value of 1 down to 0, without raising the value of any other nodes. We can repeat this 
process, to bring all of the nodes with an initial value of 1 down to 0. Since the true 
average is 1/2, the final result is 1/2 away from the true average. Note now that Q can 
grow linearly with n, and still satisfy the inequality Q < n/2. Thus, the number of bits 
can grow as fi(logn), and yet, independent of n, the error remains 1/2. 



6 Conclusions 

We studied distributed algorithms for the averaging problem over networks with time- 
varying topology, with a focus on tight bounds on the convergence time of a general class 
of averaging algorithms. We first considered algorithms for the case where agents can 
exchange and store continuous values, and established tight convergence time bounds. 
We next studied averaging algorithms under the additional constraint that agents can 
only store and send quantized values. We showed that these algorithms guarantee con- 
vergence of the agents values to consensus within some error from the average of the 
initial values. We provided a bound on the error that highlights the dependence on the 
number of quantization levels. 

Our paper is a contribution to the growing literature on distributed control of multi- 
agent systems. Quantization effects are an integral part of such systems but, with the 
exception of a few recent studies, have not attracted much attention in the vast litera- 
ture on this subject. In this paper, we studied a quantization scheme that guarantees 
consensus at the expense of some error from the initial average value. We used this 
scheme to study the effects of the number of quantization levels on the convergence time 
of the algorithm and the distance from the true average. 

The framework provided in this paper motivates a number of further research direc- 
tions: 

(a) The algorithms studied in this paper assume that there is no delay in receiving 
the values of the other agents, which is a restrictive assumption in network set- 
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tings. Understanding the convergence of averaging algorithms and implications of 
quantization in the presence of delays is an important topic for future research. 

(b) We studied a quantization scheme with favorable convergence properties, that is, 
rounding down to the nearest quantization level. Investigation of other quanti- 
zation schemes and their impact on convergence time and error is left for future 
work. 

(c) The quantization algorithm we adopted implicitly assumes that the agents can 
carry out computations with continuous values, but can store and transmit only 
quantized values. Another interesting area for future work is to incorporate the 
additional constraint of finite precision computations into the quantization scheme. 

(d) Although our bounds are tight in the worst case over all graphs, they are not 
guaranteed to perform better on well-connected graphs as compared to sparse 
graphs with many potential bottlenecks. An interesting question is whether it is 
be possible to pick averaging algorithms that learn the graph and make optimal 
use of its information diffusion properties. 
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